#####################
REPLICATION MATERIALS
#####################

ARTICLE: "Discrete Choice Data with Unobserved Heterogeneity: A Conditional Binary Quantile Model"
AUTHOR: Xiao Lu
JOURNAL: Political Analysis
CONTACT: xiao.lu[.at.]gess.uni-mannheim.de
DATE: 02-04-2019

##################
TABLE OF CONTENTS:
##################

I.
- "master_distribution.R": master file for generating distributional plots (running time: 0.3973012 secs)
	- "distribution": Replication codes for comparing ALD, normal and logit distributions
		- "source.R": source functions



II.
- "master_sim.R": master file for running estimation (Simulation) (running time: 4.205141 mins)
	- "simulation": Replication codes for the simulation section
		- "sim_homo.R": Simulation I homogeneously generated data
			- running time: 3.435989 mins
		- "sim_heter.R": Simulation II heterogeneously generated data
			- running time: 46.14911 secs
		- "cbq_binary_function.R": CBQ binary functionals
		- "stan_new_models":
			- "cbq_binary_q1.stan": CBQ binary Q1
			- "cbq_binary_q2.stan": CBQ binary Q2
			- "cbq_binary_q3.stan": CBQ binary Q3
			- "cbq_binary_q4.stan": CBQ binary Q4
			- "cbq_binary_q5.stan": CBQ binary Q5
			- "cbq_binary_q6.stan": CBQ binary Q6
			- "cbq_binary_q7.stan": CBQ binary Q7
			- "cbq_binary_q8.stan": CBQ binary Q8
			- "cbq_binary_q9.stan": CBQ binary Q9



III.
- "master_eu.R": meter file for running estimation (EU legislature) (running time: 1.506834 hours)
	- "replication_Rasmussen": Replication codes for EU legislature
	
		- "substantive_interpret_leg.R": substantive interpretation and counterfactuals
			- running time: 1.566854 secs
	
		- "cbq_leg_q1_s.R": CBQ binary Q1 estimation (scaled-down version)
			- running time: 4.127496 mins
		- "cbq_leg_q2_s.R": CBQ binary Q2 estimation (scaled-down version)
			- running time: 3.761277 mins
		- "cbq_leg_q3_s.R": CBQ binary Q3 estimation (scaled-down version)
			- running time: 4.210683 mins
		- "cbq_leg_q4_s.R": CBQ binary Q4 estimation (scaled-down version)
			- running time: 3.780723 mins
		- "cbq_leg_q5_s.R": CBQ binary Q5 estimation (scaled-down version)
			- running time: 3.752472 mins
		- "cbq_leg_q6_s.R": CBQ binary Q6 estimation (scaled-down version)
			- running time: 3.698184 mins
		- "cbq_leg_q7_s.R": CBQ binary Q7 estimation (scaled-down version)
			- running time: 3.728372 mins
		- "cbq_leg_q8_s.R": CBQ binary Q8 estimation (scaled-down version)
			- running time: 3.74015 mins
		- "cbq_leg_q9_s.R": CBQ binary Q9 estimation (scaled-down version)
			- running time: 59.58455 mins

		- "cbq_binary_function.R": CBQ binary functionals	
		- "replicate_Rasmussen.R": replicate the original analysis using logit	
		- "stan_new_models":
			- "cbq_binary_q1.stan": CBQ binary Q1
			- "cbq_binary_q2.stan": CBQ binary Q2
			- "cbq_binary_q3.stan": CBQ binary Q3
			- "cbq_binary_q4.stan": CBQ binary Q4
			- "cbq_binary_q5.stan": CBQ binary Q5
			- "cbq_binary_q6.stan": CBQ binary Q6
			- "cbq_binary_q7.stan": CBQ binary Q7
			- "cbq_binary_q8.stan": CBQ binary Q8
			- "cbq_binary_q9.stan": CBQ binary Q9
		- "archive": original data and codes
			- "origin_legislature.R": codes to reproduce original tables and figures



IV.
- "master_vote.R": master file for running estimation (US election) (running time: 29.24646 mins)
	- "replication_AN": Replication codes for US election studies
		
		- "substantive_interpretation_AN.R": substantive interpretation and counterfactuals
			- running time: 1.968926 mins

		- "cbq_vote_AN_q1_s.R": CBQ Q1 estimation (scaled-down version)
			- running time: 3.016919 mins
		- "cbq_vote_AN_q2_s.R": CBQ Q2 estimation (scaled-down version)
			- running time: 3.114219 mins
		- "cbq_vote_AN_q3_s.R": CBQ Q3 estimation (scaled-down version)
			- running time: 2.93425 mins
		- "cbq_vote_AN_q4_s.R": CBQ Q4 estimation (scaled-down version)
			- running time: 3.024776 mins
		- "cbq_vote_AN_q5_s.R": CBQ Q5 estimation (scaled-down version)
			- running time: 3.045839 mins
		- "cbq_vote_AN_q6_s.R": CBQ Q6 estimation (scaled-down version)
			- running time: 2.993994 mins
		- "cbq_vote_AN_q7_s.R": CBQ Q7 estimation (scaled-down version)
			- running time: 2.94597 mins
		- "cbq_vote_AN_q8_s.R": CBQ Q8 estimation (scaled-down version)
			- running time: 3.021083 mins
		- "cbq_vote_AN_q9_s.R": CBQ Q9 estimation (scaled-down version)
			- running time: 3.180487 mins

		- "replicate_AN.R": repliate the original findings by AN (embedded)
		- "cbq_function.R": CBQ functionals
		- "stan_new_models":
			- "cbq4.0_q1.stan": CBQ Q1
			- "cbq4.0_q2.stan": CBQ Q2
			- "cbq4.0_q3.stan": CBQ Q3
			- "cbq4.0_q4.stan": CBQ Q4
			- "cbq4.0_q5.stan": CBQ Q5
			- "cbq4.0_q6.stan": CBQ Q6
			- "cbq4.0_q7.stan": CBQ Q7
			- "cbq4.0_q8.stan": CBQ Q8
			- "cbq4.0_q9.stan": CBQ Q9
		- "archive": original data and codes	
			- "origin_vote.R": codes to reproduce original tables and figures



V.
- "master_coalition.R": master file for running estimation (Government formation) (running time: 21.90352 hours)
	- "replication_MS": Replication codes for government formation
		- "substantive_interpret.R": substantive interpretation and counterfactuals
		- "ms_vis.R": visualize quantile estimates
			- running time: 6.340495 secs (the above two files)

		- "cbq_ms_q1_s.R": CBQ Q1 estimation (scaled-down version)
			- running time: 2.628276 hours
		- "cbq_ms_q2_s.R": CBQ Q2 estimation (scaled-down version)
			- running time: 2.848783 hours
		- "cbq_ms_q3_s.R": CBQ Q3 estimation (scaled-down version)
			- running time: 1.619078 hours
		- "cbq_ms_q4_s.R": CBQ Q4 estimation (scaled-down version)
			- running time: 3.775715 hours
		- "cbq_ms_q5_s.R": CBQ Q5 estimation (scaled-down version)
			- running time: 1.761863 hours
		- "cbq_ms_q6_s.R": CBQ Q6 estimation (scaled-down version)
			- running time: 2.95516 hours
		- "cbq_ms_q7_s.R": CBQ Q7 estimation (scaled-down version)
			- running time: 2.07724 hours
		- "cbq_ms_q8_s.R": CBQ Q8 estimation (scaled-down version)
			- running time: 2.355557 hours
		- "cbq_ms_q9_s.R": CBQ Q9 estimation (scaled-down version)
			- running time: 1.880086 hours

		- "counterfactual.dta": counterfactuals from original estimation
		- "prob_diff_GGG.dta": predictions by GGG using MXL
		- "cbq_function": CBQ functionals
		- "stan_new_models":
			- "cbq4.0_q1.stan": CBQ Q1
			- "cbq4.0_q2.stan": CBQ Q2
			- "cbq4.0_q3.stan": CBQ Q3
			- "cbq4.0_q4.stan": CBQ Q4
			- "cbq4.0_q5.stan": CBQ Q5
			- "cbq4.0_q6.stan": CBQ Q6
			- "cbq4.0_q7.stan": CBQ Q7
			- "cbq4.0_q8.stan": CBQ Q8
			- "cbq4.0_q9.stan": CBQ Q9
		- "archive": original data and codes
			- "origin_coalition.R": codes to reproduce original tables and figures


#############
INSTRUCTIONS:
#############

CAUTION: 

- For three real world applications, the scaled-down files were used for PA replication purposes. For those files, the MCMC convergence is not guaranteed. Please increase the numbers of iterations and chains to examine full convergence!!! 

- Estimation data and codes with full convergence are stored in the archive folders. In order to reproduce the original tables and figures, please uncomment the line at the bottom of each master file.

- To run the analyses, execute the following master R files: 
(Please set the working directory of each master file to the location of the file in your computer.)
	- Distributional illustration: "master_distributions.R"
	- Simulation: "master_sim.R"
	- EU legislature: "master_eu.R"
	- US election: "master_vote.R"
	- Government formation: "master_coalition.R"

- The original estimations were run in parallel with a remote Linux high-performance computing cluster equipped with Intel Xeon E5-2640v3 CPUs. For the PA replication purpose, the original codes were scaled down (numbers of chains and iterations were reduced and codes were run sequentially) and run in a local iMac Pro equipped with a 10-core Intel Xeon W-2150B CPU. The running time of each estimation was based on the local machine and is only a very rough indicator. Due to the large data size, each original CBQ estimation of government formation can take up to 6 days in the high-performance computing cluster. Adjust your expectations accordingly! 

- Users might potentially face an “irrecoverable exception” error when running “master_sim.R". This is an old problem with the compiler cpp when exiting R after rerunning multiple stan models in a single file. Some explanations are here: https://github.com/stan-dev/rstan/issues/296. So this has nothing to do with the models and the results, and can unexpectedly happen. It seems not to be totally solved yet.

- The low- and high-quantile interpretations depend on the assumption of the error structure, i.e., whether epsilon ~ ALD or -epsilon ~ ALD. Due to the asymmetry of the ALD, estimates from quantile q can be the same with those from quantile (1 - q) if we assume a different error structure. Therefore, based on the choice of the error structure, I recommend drawing logically coherent counterfactual scenarios for substantive interpretations. 

- The following replication data were downloaded and stored in the respective folders. These data do not need to be downloaded manually. They are provided as part of the Dataverse material. Users do have to make one manual download for “formation_new.tab”, however. Dataverse automatically converts “.tab” into “.tsv” format, so the downloaded material contains the incorrect “formation_new.tsv” stored in the folder “replication_MS”. To replace this “.tsv” version with the correct “.tab” version, please locate “formation_new.tab” on the Dataverse page, click “Download” and select “Tab-Delimited”. When saving the file, please make sure that it has the correct format (“.tab”) and place it in the folder “replication_MS.
	- EU legislature. "EUP388675-supplementary_material.dta": replication data downloaded from http://journals.sagepub.com/doi/suppl/10.1177/1465116510388675. Rasmussen, A. (2010). Early Conclusion in Bicameral Bargaining: Evidence from the Co-decision Legislative Procedure of the European Union. European Union Politics 12 (1), 41-64.
	- US election. "nes9212r.asc": replication data downloaded from https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/1112. Alvarez, R. M. and J. Nagler (1995). Economics, Issues and the Perot Candidacy: Voter Choice in the 1992 Presidential Election. American Journal of Political Science 39 (3), 714-744.
		- To access the data: Enter into the webpage: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/1112, click the download bottom below "Version V1" and select "Other" rather than "Document Only". Then you are able to download a zip file named "ICPSR_01112-V1.zip". Pleaes unzip it, enter into the folder in the order: "ICPSR_01112 3"->"DS0001", unzip the file "01112-0001-Zipped_package.zip", enter into the folder "s1112", unzip "alvar95.zip", enter into the folder "alvar95", and there you find the file "nes9212r.asc".
	- Government formation. "formation_new.tab": replication data downloaded from  https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HJCPDK. Attributed to Martin, L. W. and R. T. Stevenson (2001). Government Formation in Parliamentary Democracies. American Journal of Political Science 45 (1), 33-50. and Glasgow, G., M. Golder, and S. N. Golder (2012). New Empirical Strategies for the Study of Parliamentary Government Formation. Political Analysis 20, 248-270.

The following R packages are required to run the analyses:
- rstan
- readstata13
- mlogit
- ggplot2



If you have any further questions concerning the replication materials, please email the author (Xiao Lu) at xiao.lu[.at.]gess.uni-mannheim.de.

